BioAR: Anaphora Resolution For Relating Protein Names To Proteome Database Entries

نویسندگان

  • Jung-Jae Kim
  • Jong C. Park
چکیده

The need for associating, or grounding, protein names in the literature with the entries of proteome databases such as Swiss-Prot is well-recognized. The protein names in the biomedical literature show a high degree of morphological and syntactic variations, and various anaphoric expressions including null anaphors. We present a biomedical anaphora resolution system, BioAR, in order to address the variations of protein names and to further associate them with Swiss-Prot entries as the actual entities in the world. The system shows the performance of 59.5% 75.0% precision and 40.7% 56.3% recall, depending on the specific types of anaphoric expressions. We apply BioAR to the protein names in the biological interactions as extracted by our biomedical information extraction system, or BioIE, in order to construct protein pathways automatically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BioThesaurus: a web-based thesaurus of protein and gene names

UNLABELLED BioThesaurus is a web-based system designed to map a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase. Currently covering more than two million proteins, BioThesaurus consists of over 2.8 million names extracted from multiple molecular biological databases according to the database cross-references in iProClass. The BioThesaurus web s...

متن کامل

The Effect of Anaphor and Ellipsis Resolution on Proximity Searching in a Text Database

So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval (IR) of scientific abstracts. No significant improvement has been observed. In this study, the effects of ellipsis and anaphor resolution on proximity searching in a full text database are analyzed. Anaphora and elli...

متن کامل

I-3: Human Y Chromosome Proteome Project 2012 Update

The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...

متن کامل

Quantitative Assessment of Dictionary-based Protein Named Entity Tagging

Objective: Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the...

متن کامل

Research Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging

OBJECTIVE Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004